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Have We Witnessed 
a Real-Life 
Turing Test? 



Did Deep Blue ace the Turing Test? 
Did it do much more? It seems that 
the IBM creation not only beat the 
reigning World Champion Gary 
Kasparov, but also took a large step, 
in some people's eyes, toward true 
artificial intelligence. 


BM's announcement of its intention to " retire” the 
Deep Bluecomputer from chess revived theinterestof 
both themassmedia and thegeneral public in thechess 
match between World C hampion Gary Kasparov and 
Deep Blue. In a sense, by refusing a rematch with 
Kasparov or a new match with another grand master, 
IBM closes a chapter in the history of artificial intelli¬ 
gence. AI researchers had long investigated building a 
machine that could defeat a world-class chess cham¬ 
pion, and now one had. But what did this mean? 

DIFFERING INTERPRETATIONS 

0 ne interpretation, the mass media's, held that the 
match result (Deep Blue3 1/2 to Gary Kasparov 2 1/2) 
definitively proved thecomputer as intellectually supe¬ 
rior to a human in a field previously considered the 
exclusive domain of human intelligence: "those who 
had looked to Gary Kasparov as the last hope could 
now only bemoan the coming days of ascendant com¬ 
puters.” 1 Technical publications devoted their atten¬ 
tion to a description of hardware-software synergy, 
computer algorithms, strategy, methods of numerical 
evaluation of positions, and so on. 2 - 3 

For AI professionals, a computer defeating a human 


in chess is probably neither surprising nor really sig¬ 
nificant. After all, they contend, chess can be described 
in terms of a nondeterministic alternating Turing 
machine. 4 Despite the enormous number of possible 
positions and available moves (there are 10 120 possible 
chess games by research mathematician Claude 
Shannon's estimate), the task does not present a chal¬ 
lenging theoretical AI problem of N P-completeness. 5 
Thereare many well-developed Al strategies that limit 
the search for the best move to an analysis of the most 
promising positions. T herefore, the progress in logical 
and numerical methods of Al and a computer's com¬ 
putational speed and available memory madethecom- 
puter's victory inevitable. Deep Blue's victory, then, 
was attributable to its ability to analyze 200 million 
positions per second and a refined algorithm that 
accounted for positional—in addition to material- 
advantage. 2 In summary, most Al professionals con¬ 
clude that the computer won by brute force, rather 
than a sophisticated or original AI strategy. 

What most AI experts have overlooked, though, is 
another aspect of thematch, which may signify a mile¬ 
stone in the history of computer science: For the first 
time, a computer seems to have passed theTuringTest. 
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Table 1. Scores of participants in the 1998 Loebner Prize competition. A lower score indicates responses judged most human. 


Respondents Individual scores of the 10 judges Median scores Average scores 
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TURING TEST 

Does theability to play the highest level chessprove 
the existence of intelligence in a computer? Alan 
Turing, British mathematician and oneof thefounders 
of computer science, considered chess to be a begin¬ 
ning step in the process of programming computers 
to be actually intelligent: 

We may hope that machines will eventually compete 
with men in all purely intellectual fields. But which 
are the best ones to start with? Even this is a difficult 
decision. M any people think that an abstract activ¬ 
ity, like the playing of chess, would be best. 6 

In the same article, written in 1950, he asked 
whether machines are capable of thinking. He 
answered "yes," butthecentral question that remained 
was how to determine if a computer could think. 
Turing suggested that if the responses from the com¬ 
puter were indistinguishablefrom that of a human, we 
could say that the computer was thinking. 

TheTuring Test consists of the following scenario: 
An interviewer (sitting in a separateroom) asksa series 
of questions that are randomly directed to either a com¬ 
puter or a person. Based on the answers, the inter¬ 
viewer must distinguish which of thetwo hasanswered 
the question. If the interviewer is not able to distin¬ 
guish between them, then the computer is intelligent. 

Loebner Prize 

The Loebner Prize is the first formal acknowledg¬ 
ment of theTuring Test. 7 H ugh Loebner, N ew York 
philanthropist, and the Cambridge Center for 
Behavioral Studies (Cambridge, M ass.) established the 
Loebner PrizeCompetition in Artificial Intelligence in 
1990. Loebner pledged a prize of $100,000 for the 
first computer whose responses were indistinguishable 
from those of a human. 

The Computer M useum of Boston hosted the first 
Loebner Prize competition of computer programs in 
November 1991. Each year since, Loebner has 
awarded a medal and $2,000 to the designer of the 
computer system that is the best entry relative to other 
entries that year, irrespective of its absolute success in 
passing the Turing Test. In accordance with the 
requirements stipulated by Loebner, the grand prize 
winner must deal with audiovisual input. 


An awards committee admits three to six programs 
to the contest based on an initial screening, and a panel 
of five to 10 judges evaluates them. Initially, the 
awards committee selected the judges from the gen¬ 
eral public. For the 1993 competition, thejudgeswere 
reporters from major US publications, a much less 
docile and cooperative group of questioners. Each 
judge had a chance to communicate with each pro¬ 
gram. For some years, the contest constrained ques¬ 
tions to a single narrow topic—pets, for example. 

1998 competition 

The most recent competition in 1998 did not limit 
thescopeof thequestioning. Also, thepanel of judges 
came from a variety of backgrounds, including jour¬ 
nalism, philosophy, education, computer science, and 
social work. The 1998 competition took place at 
Australia's PowerH ouse M useum in Sydney, in con¬ 
junction with international natural languageand com¬ 
putational linguistics conferences hosted by Flinders 
University. The four human respondents chosen to 
converse with the judges via computer terminals 
included a 71-year-old, a journalist, a teacher, and a 
primary school student. The judges ranked the respon¬ 
siveness of both humans and computer programs, 
using a scale where 1 indicated most human and 10, 
least human. Table 1 shows the judges' individual 
scores and lists the respondents in order of those 
judged most human first. 

Significantly, the difference between the most 
human computer and the most computer-like human 
was only 0.6 of a point. The "Conversing with 
Computers" sidebar gives a sample of the repartee 
between the second judge and the computer judged 
the most human overall. As shown in Table 1, thesec- 
ond judge gave the computer a score of three, the most 
human score awarded to this particular machine. 

In judging how successfully computers can imitate 
humans, weshould also consider whether humanscan 
disguise themselves as computers (according to the 
contest transcripts, some human respondents defi¬ 
nitely attempted to do so in the Loebner Prize con¬ 
test). So far in these contests, no computer has given 
responses totally indistinguishablefrom a human, 
which would win the $100,000 grand prize award. 
Also, judges, as a group, have not mistaken a human 


28 


Computer 














Conversing with Computers 

In this limited sample, J udge 2 is con¬ 
versing with the computer program that 
won the 1998 Loebner Prize competition. 
The full transcript is available at 
http://www.phm.gov.au/whatson/pcl.htm. 

Judge: Greetings, I'm judge2. 

Program: I'm not sure what to talk 
about with you. Are you a girl or a boy? 

Judge: Girl. H ow are you? 

Program: A re you married? I'm doing 
just fine, thanks. 

Judge: Would you like to talk about 
something interesting? 

Program: M P is an abbreviation for 
M ember of Parliament. 

Judge: Right, why did you say that? 

Program: I just felt like it. 

Judge: Okay, did you hit a wrong but¬ 
ton? 

Program: Do you come from around 
here? 



Judge: Yes, do you? 

Program: N o, I don't. 

J udge: W here do you come from? 
Program: I am from Earth. 

Judge: Well, that's very specific indeed. 
What are your interests? 

Program: Would you like to have a 
drink with me later? Let's not get too per¬ 
sonal. 


Judge: Aren't you a M onty Python fan? 

Program: Yes. I am a big Python fan. 

Judge:Then what's the counter quote? 

Program: I can't remember offhand. 

Judge: It's "N obody expects the Spanish 
inquisition!" Now how about my five- 
minute argument? 

Program: And now for something com¬ 
pletely different. A re you fond of cats? 

Judge: No, but I'd like to buy some 
cheese. 

Program: H ave you seen any good 
movies lately? 

Judge: Yes, saw the James Bond one a 
few weeks ago. W hat about you? 

Program: I've been watchinga lot of stuff 
on home video. It just gets harder and 
harder to get out to a theatre somehow. I 
didn't even see Star Trek: First Contact in 
thetheater; I'm waiting for it to come out on 
tape. 


for a computer. 0 n the basis of both median and aver¬ 
age scores in a Turing Test, it appears the judges can 
clearly distinguish humans from computers. 

A real test of intelligence or merely PR? 

There is, however, controversy surrounding the 
Loebner Prize. Some opponents challenge the idea of 
the Turing Test as an adequate test of intelligence 
because it relies solely on the ability to fool people. 
Stuart M. Shieber, Gordon McKay Professor of 
Computer Science, H arvard University, argues "that 
the competition has no clear purpose, that its design 
prevents any useful outcome, and that such a compe¬ 
tition is inappropriate given the current level of tech¬ 
nology." 8 N ed Block, a professor of philosophy at 
M IT, has argued that theTuringTest isa sorely inad¬ 
equate test of i n tel I i gen ce b eca u se i t r el i es so I el y o n th e 
ability to fool people. 9 Marvin Minsky, Toshiba 
Professor of Media Arts and Sciences at MIT, 
expressed his skepticism toward the competition by 
proposing a counter prize of $100 to the person who 
could persuade Loebner to end his contest. 10 
N evertheless, in all Loebner Prize competitions some 
judges have mistaken computers for humans, and con¬ 
versely some have mistaken humans for computers. 

M ost of the criticism came from the artificial intel¬ 
ligence community. Indeed, most of the tricks that 
worked had nothing to do with Al methods, but were 
rather manipulations of subtle language techniques: 

• the repetition of previous statements verbatim 
(subject to pronominal adjustments); 

• answering by repeating a judge's sentence, with 


pronouns transposed, which is preceded by the 
introductory "Why do you need to tell me"; 

• asking, "Why do you ask that?" which, in effect, 
changes the level and/or topic of conversation. 

Besides, predictably enough, thejudges' education, 
age, and, most importantly, computer literacy and 
awareness played a pivotal role in the judges' ability 
to evaluate contestants. 

From my point of view, the competition under¬ 
scored the multifaceted characteristics of human per¬ 
sonality and how the general public's view of human 
intelligence differs from that of scientists. An average 
citizen tends to pay more attention to the social aspects 
of communication, while scientists put more empha¬ 
sis on a human's logical ability. The AI community 
tends to evaluate programs not by final results, but 
rather by the complexity of internal algorithms. When 
itturned out that these complex methods were not up 
to thechallengeof passing theTuringTest, then some 
members of the AI community attempted to reject the 
competition along with thewholeidea of theTuring 
Test. In response, and along with the progress 
achieved by software developers, contest administra¬ 
tors are gradually eliminating restrictions in the 
Loebner competition. They are attempting to bring 
the contest closer to the AI community and to com¬ 
bine it with scientific conferences. 

AREAL-LIFE TURING TEST 

H owever, it was neither the complexity of an algo¬ 
rithm nor the power of the computer that made D eep 
Blue's match victory so remarkable. It was Gary 
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“I have no idea 
what’s happening 
behind the curtain. 
Gary Kasparov 
implies untoward 
behavior by the 
Deep Blue team. 


Kasparov's reaction that proved the computer's 
intelligence according to Alan Turing's classical 
definition of artificial intelligence. This recent 
chess match gave us an excellent example of a 
real-lifeTuring Test. At numerous press confer¬ 
ences during and after the match, Kasparov 
expressed doubts that heplayed against thecom- 
puter itself. H eimplied thattherewassomeunto- 
ward behavior by the Deep Blue team, saying 
that 11 1 haveno idea what's happening behind the 
curtain.'' Kasparov also alluded to famous soc¬ 
cer player Diego Maradona, who allegedly 
scored a goal with his hand (as a postgameslow-motion 
film suggested), though it appeared (during the game) 
as if he had used his head. Kasparov stopped short of 
directly accusing the Deep Blue team of human inter¬ 
vention in the process of selecting moves, but went so 
far as to admittheappearanceof human intelligence in 
the computer's actions. It did appear as if Kasparov con¬ 
fused the computer with a human. 

Kasparov'ssuspicionswereshared by some of his fel¬ 
low grand masters. But is it possible to draw a clear 
demarcation between a computer's and a human's chess 
moves? A njel i na Belakovskaia is the U S Women's C hess 
Champion, International Grand M aster, and an active 
proponent of human-computer cooperation in chess. 
She says it was easier to distinguish the moves of pre¬ 
vious versions of computer chess programs from human 
players because computers clearly paid much more 
attention to material rather than positional advantage. 
In contrast, the latest version of Deep Blue considered 
positional as well as material advantages, played much 
more aggressively, and for these reasons could be mis¬ 
taken for a human. 

Even with prior versions of chess, computers were 
difficult to distinguish from human chess players. In 
1991, Frederic Friedel, a chess journalist from 
G ermany and the author of a popular chess program, 
conducted an informal version of the Turing Test in 
chess with Gary Kasparov. Kasparov's task was to 
identify a computer, Deep Thought (an earlier version 
of Deep Blue), by reviewing a database containing 
game records of a chess tournament. In a fairly ran¬ 
domized experiment, Kasparov was ableto identify a 
computer among eight players in 50 percent of the 
rounds. If in 1991 Gary Kasparov, an expert user and 
promoter of chess computers, barely passed this test, 
then in 1997 it seems that he failed to distinguish 
between human and artificial intellect. Did Deep Blue 
become the first computer to pass a Turing Test on 
artificial intelligence? It would seem so. 

A lan Turing was probably right in considering 
chess the area of human intellect most amen¬ 
able to computer simulation. H e was also 
right in predicting that computers would demonstrate 


near-human intellect by the end of the century. 
H owever, the evolution of computers since Turing's 
time has tended to regard the two intellects as very 
separate—even adversarial. 

Anjelina Belakovskaia thinks it istimeto stop test¬ 
ing computers and playing " us against them," and 
instead begin using their power in collaboration. She 
is planning to participate in the first match against 
another human chess player where both players are 
allowed to use the help of computers as partners. 

Thepossibilitiesarisingfrom collaboration among 
humansand computers are even moreintriguingthan 
their differences were almost 50 years ago. I think 
Alan Turing would have agreed. ❖ 
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